Overview

Dataset statistics

Number of variables17
Number of observations546651
Missing cells0
Missing cells (%)0.0%
Duplicate rows84918
Duplicate rows (%)15.5%
Total size in memory70.9 MiB
Average record size in memory136.0 B

Variable types

Numeric9
Categorical8

Alerts

Dataset has 84918 (15.5%) duplicate rowsDuplicates
signup_flow is highly correlated with affiliate_channel and 1 other fieldsHigh correlation
days_from_account_created_until_first_booking is highly correlated with day_first_booking and 2 other fieldsHigh correlation
day_first_booking is highly correlated with days_from_account_created_until_first_booking and 2 other fieldsHigh correlation
day_of_week_first_booking is highly correlated with day_of_week_account_createdHigh correlation
year_account_created is highly correlated with days_from_account_created_until_first_booking and 1 other fieldsHigh correlation
day_account_created is highly correlated with day_first_bookingHigh correlation
day_of_week_account_created is highly correlated with day_of_week_first_bookingHigh correlation
week_of_year_account_created is highly correlated with year_account_createdHigh correlation
affiliate_channel is highly correlated with signup_flow and 2 other fieldsHigh correlation
first_affiliate_tracked is highly correlated with affiliate_channelHigh correlation
signup_app is highly correlated with signup_flow and 1 other fieldsHigh correlation
country_destination is highly correlated with days_from_account_created_until_first_booking and 1 other fieldsHigh correlation
days_from_first_active_until_account_created is highly skewed (γ1 = 67.7710734) Skewed
signup_flow has 440570 (80.6%) zeros Zeros
days_from_first_active_until_account_created has 545687 (99.8%) zeros Zeros
days_from_account_created_until_first_booking has 122231 (22.4%) zeros Zeros
day_of_week_first_booking has 127650 (23.4%) zeros Zeros
day_of_week_account_created has 89920 (16.4%) zeros Zeros

Reproduction

Analysis started2022-09-09 20:13:41.933881
Analysis finished2022-09-09 20:14:27.694805
Duration45.76 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

signup_flow
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct26
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.79915522
Minimum0
Maximum25
Zeros440570
Zeros (%)80.6%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-09-09T22:14:27.790581image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile18
Maximum25
Range25
Interquartile range (IQR)0

Descriptive statistics

Standard deviation5.542365681
Coefficient of variation (CV)3.08053781
Kurtosis10.61813579
Mean1.79915522
Median Absolute Deviation (MAD)0
Skewness3.431729962
Sum983510
Variance30.71781734
MonotonicityNot monotonic
2022-09-09T22:14:27.880339image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=26)
ValueCountFrequency (%)
0440570
80.6%
123979
 
4.4%
220907
 
3.8%
2513496
 
2.5%
311125
 
2.0%
129423
 
1.7%
248033
 
1.5%
233111
 
0.6%
51616
 
0.3%
41605
 
0.3%
Other values (16)12786
 
2.3%
ValueCountFrequency (%)
0440570
80.6%
123979
 
4.4%
220907
 
3.8%
311125
 
2.0%
41605
 
0.3%
51616
 
0.3%
61460
 
0.3%
71291
 
0.2%
81395
 
0.3%
91161
 
0.2%
ValueCountFrequency (%)
2513496
2.5%
248033
1.5%
233111
 
0.6%
22628
 
0.1%
21834
 
0.2%
20426
 
0.1%
19432
 
0.1%
18443
 
0.1%
17469
 
0.1%
16429
 
0.1%

days_from_first_active_until_account_created
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct324
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2278418955
Minimum0
Maximum1456
Zeros545687
Zeros (%)99.8%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-09-09T22:14:27.979817image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum1456
Range1456
Interquartile range (IQR)0

Descriptive statistics

Standard deviation10.34201202
Coefficient of variation (CV)45.39117795
Kurtosis5689.364061
Mean0.2278418955
Median Absolute Deviation (MAD)0
Skewness67.7710734
Sum124550
Variance106.9572126
MonotonicityNot monotonic
2022-09-09T22:14:28.083926image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0545687
99.8%
191
 
< 0.1%
276
 
< 0.1%
463
 
< 0.1%
361
 
< 0.1%
516
 
< 0.1%
612
 
< 0.1%
1312
 
< 0.1%
109
 
< 0.1%
168
 
< 0.1%
Other values (314)616
 
0.1%
ValueCountFrequency (%)
0545687
99.8%
191
 
< 0.1%
276
 
< 0.1%
361
 
< 0.1%
463
 
< 0.1%
516
 
< 0.1%
612
 
< 0.1%
75
 
< 0.1%
84
 
< 0.1%
98
 
< 0.1%
ValueCountFrequency (%)
14561
< 0.1%
13691
< 0.1%
13611
< 0.1%
11481
< 0.1%
10361
< 0.1%
10111
< 0.1%
10061
< 0.1%
9931
< 0.1%
9841
< 0.1%
9621
< 0.1%

days_from_account_created_until_first_booking
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct1971
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean116.1756386
Minimum-349
Maximum2001
Zeros122231
Zeros (%)22.4%
Negative231
Negative (%)< 0.1%
Memory size4.2 MiB
2022-09-09T22:14:28.194675image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum-349
5-th percentile0
Q11
median6
Q3104
95-th percentile673
Maximum2001
Range2350
Interquartile range (IQR)103

Descriptive statistics

Standard deviation241.2298612
Coefficient of variation (CV)2.076423803
Kurtosis9.71243456
Mean116.1756386
Median Absolute Deviation (MAD)6
Skewness2.964137586
Sum63507529
Variance58191.84595
MonotonicityNot monotonic
2022-09-09T22:14:28.306550image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0122231
22.4%
167733
 
12.4%
230398
 
5.6%
319576
 
3.6%
414563
 
2.7%
511686
 
2.1%
610191
 
1.9%
79405
 
1.7%
87926
 
1.4%
96553
 
1.2%
Other values (1961)246389
45.1%
ValueCountFrequency (%)
-3491
< 0.1%
-3471
< 0.1%
-3381
< 0.1%
-3081
< 0.1%
-2981
< 0.1%
-2951
< 0.1%
-2881
< 0.1%
-2731
< 0.1%
-2691
< 0.1%
-2611
< 0.1%
ValueCountFrequency (%)
20012
< 0.1%
19991
< 0.1%
19951
< 0.1%
19921
< 0.1%
19912
< 0.1%
19902
< 0.1%
19801
< 0.1%
19791
< 0.1%
19771
< 0.1%
19761
< 0.1%

day_first_booking
Real number (ℝ≥0)

HIGH CORRELATION

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16.60590212
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-09-09T22:14:28.416511image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q19
median17
Q325
95-th percentile29
Maximum31
Range30
Interquartile range (IQR)16

Descriptive statistics

Standard deviation9.02490633
Coefficient of variation (CV)0.5434758235
Kurtosis-1.271097189
Mean16.60590212
Median Absolute Deviation (MAD)8
Skewness-0.09524150734
Sum9077633
Variance81.44893427
MonotonicityNot monotonic
2022-09-09T22:14:28.512708image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
2967560
 
12.4%
517899
 
3.3%
1517753
 
3.2%
1617634
 
3.2%
2317531
 
3.2%
1317506
 
3.2%
2217502
 
3.2%
317422
 
3.2%
417410
 
3.2%
2517340
 
3.2%
Other values (21)321094
58.7%
ValueCountFrequency (%)
114825
2.7%
216249
3.0%
317422
3.2%
417410
3.2%
517899
3.3%
616645
3.0%
715697
2.9%
815892
2.9%
917216
3.1%
1016644
3.0%
ValueCountFrequency (%)
312531
 
0.5%
309169
 
1.7%
2967560
12.4%
2814752
 
2.7%
2715209
 
2.8%
2616328
 
3.0%
2517340
 
3.2%
2416970
 
3.1%
2317531
 
3.2%
2217502
 
3.2%

day_of_week_first_booking
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.223343596
Minimum0
Maximum6
Zeros127650
Zeros (%)23.4%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-09-09T22:14:28.603801image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q34
95-th percentile5
Maximum6
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.794642656
Coefficient of variation (CV)0.8071818764
Kurtosis-0.9769581362
Mean2.223343596
Median Absolute Deviation (MAD)1
Skewness0.3564866122
Sum1215393
Variance3.220742261
MonotonicityNot monotonic
2022-09-09T22:14:28.680598image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0127650
23.4%
294534
17.3%
193776
17.2%
385121
15.6%
471454
13.1%
553326
9.8%
620790
 
3.8%
ValueCountFrequency (%)
0127650
23.4%
193776
17.2%
294534
17.3%
385121
15.6%
471454
13.1%
553326
9.8%
620790
 
3.8%
ValueCountFrequency (%)
620790
 
3.8%
553326
9.8%
471454
13.1%
385121
15.6%
294534
17.3%
193776
17.2%
0127650
23.4%

year_account_created
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
2013
222821 
2012
156289 
2014
112780 
2011
48454 
2010
 
6307

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters2186604
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2011
2nd row2010
3rd row2011
4th row2010
5th row2010

Common Values

ValueCountFrequency (%)
2013222821
40.8%
2012156289
28.6%
2014112780
20.6%
201148454
 
8.9%
20106307
 
1.2%

Length

2022-09-09T22:14:28.768617image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-09T22:14:28.885692image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
2013222821
40.8%
2012156289
28.6%
2014112780
20.6%
201148454
 
8.9%
20106307
 
1.2%

Most occurring characters

ValueCountFrequency (%)
2702940
32.1%
1595105
27.2%
0552958
25.3%
3222821
 
10.2%
4112780
 
5.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2186604
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2702940
32.1%
1595105
27.2%
0552958
25.3%
3222821
 
10.2%
4112780
 
5.2%

Most occurring scripts

ValueCountFrequency (%)
Common2186604
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2702940
32.1%
1595105
27.2%
0552958
25.3%
3222821
 
10.2%
4112780
 
5.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII2186604
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2702940
32.1%
1595105
27.2%
0552958
25.3%
3222821
 
10.2%
4112780
 
5.2%

day_account_created
Real number (ℝ≥0)

HIGH CORRELATION

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.5483334
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-09-09T22:14:28.987717image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q18
median16
Q323
95-th percentile29
Maximum31
Range30
Interquartile range (IQR)15

Descriptive statistics

Standard deviation8.50659347
Coefficient of variation (CV)0.5471064488
Kurtosis-1.186168633
Mean15.5483334
Median Absolute Deviation (MAD)7
Skewness-0.007569097575
Sum8499512
Variance72.36213246
MonotonicityNot monotonic
2022-09-09T22:14:29.082753image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
2219863
 
3.6%
2419734
 
3.6%
319600
 
3.6%
1619409
 
3.6%
1319394
 
3.5%
2319366
 
3.5%
2719218
 
3.5%
1519010
 
3.5%
2118862
 
3.5%
2518859
 
3.4%
Other values (21)353336
64.6%
ValueCountFrequency (%)
114078
2.6%
216999
3.1%
319600
3.6%
418343
3.4%
518510
3.4%
618057
3.3%
718149
3.3%
818014
3.3%
918745
3.4%
1018327
3.4%
ValueCountFrequency (%)
313100
 
0.6%
3012196
2.2%
2916014
2.9%
2817644
3.2%
2719218
3.5%
2618294
3.3%
2518859
3.4%
2419734
3.6%
2319366
3.5%
2219863
3.6%

day_of_week_account_created
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.457480184
Minimum0
Maximum6
Zeros89920
Zeros (%)16.4%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-09-09T22:14:29.170982image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q34
95-th percentile5
Maximum6
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.76668792
Coefficient of variation (CV)0.718902204
Kurtosis-0.9641039054
Mean2.457480184
Median Absolute Deviation (MAD)1
Skewness0.2654354691
Sum1343384
Variance3.121186208
MonotonicityNot monotonic
2022-09-09T22:14:29.238015image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
2102184
18.7%
1100371
18.4%
390626
16.6%
089920
16.4%
477429
14.2%
559675
10.9%
626446
 
4.8%
ValueCountFrequency (%)
089920
16.4%
1100371
18.4%
2102184
18.7%
390626
16.6%
477429
14.2%
559675
10.9%
626446
 
4.8%
ValueCountFrequency (%)
626446
 
4.8%
559675
10.9%
477429
14.2%
390626
16.6%
2102184
18.7%
1100371
18.4%
089920
16.4%

week_of_year_account_created
Real number (ℝ≥0)

HIGH CORRELATION

Distinct53
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24.03569188
Minimum1
Maximum53
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-09-09T22:14:29.335501image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4
Q114
median23
Q334
95-th percentile48
Maximum53
Range52
Interquartile range (IQR)20

Descriptive statistics

Standard deviation13.25802809
Coefficient of variation (CV)0.5515975226
Kurtosis-0.8693094212
Mean24.03569188
Median Absolute Deviation (MAD)10
Skewness0.2749158467
Sum13139135
Variance175.7753089
MonotonicityNot monotonic
2022-09-09T22:14:29.454465image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1917740
 
3.2%
2617626
 
3.2%
2317613
 
3.2%
2117335
 
3.2%
2516979
 
3.1%
2016292
 
3.0%
2416109
 
2.9%
2216053
 
2.9%
1815847
 
2.9%
1715054
 
2.8%
Other values (43)380003
69.5%
ValueCountFrequency (%)
15566
1.0%
26247
1.1%
39887
1.8%
49732
1.8%
59360
1.7%
611291
2.1%
711144
2.0%
811486
2.1%
911834
2.2%
1011559
2.1%
ValueCountFrequency (%)
532
 
< 0.1%
523196
 
0.6%
514667
0.9%
506017
1.1%
497086
1.3%
486855
1.3%
477338
1.3%
468032
1.5%
458015
1.5%
446734
1.2%

gender
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
FEMALE
243863 
MALE
215453 
-unknown-
85653 
OTHER
 
1682

Length

Max length9
Median length6
Mean length5.678718232
Min length4

Characters and Unicode

Total characters3104277
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMALE
2nd rowFEMALE
3rd rowFEMALE
4th row-unknown-
5th rowFEMALE

Common Values

ValueCountFrequency (%)
FEMALE243863
44.6%
MALE215453
39.4%
-unknown-85653
 
15.7%
OTHER1682
 
0.3%

Length

2022-09-09T22:14:29.768625image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-09T22:14:29.857388image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
female243863
44.6%
male215453
39.4%
unknown85653
 
15.7%
other1682
 
0.3%

Most occurring characters

ValueCountFrequency (%)
E704861
22.7%
M459316
14.8%
A459316
14.8%
L459316
14.8%
n256959
 
8.3%
F243863
 
7.9%
-171306
 
5.5%
u85653
 
2.8%
k85653
 
2.8%
o85653
 
2.8%
Other values (5)92381
 
3.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter2333400
75.2%
Lowercase Letter599571
 
19.3%
Dash Punctuation171306
 
5.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E704861
30.2%
M459316
19.7%
A459316
19.7%
L459316
19.7%
F243863
 
10.5%
O1682
 
0.1%
T1682
 
0.1%
H1682
 
0.1%
R1682
 
0.1%
Lowercase Letter
ValueCountFrequency (%)
n256959
42.9%
u85653
 
14.3%
k85653
 
14.3%
o85653
 
14.3%
w85653
 
14.3%
Dash Punctuation
ValueCountFrequency (%)
-171306
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2932971
94.5%
Common171306
 
5.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
E704861
24.0%
M459316
15.7%
A459316
15.7%
L459316
15.7%
n256959
 
8.8%
F243863
 
8.3%
u85653
 
2.9%
k85653
 
2.9%
o85653
 
2.9%
w85653
 
2.9%
Other values (4)6728
 
0.2%
Common
ValueCountFrequency (%)
-171306
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII3104277
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E704861
22.7%
M459316
14.8%
A459316
14.8%
L459316
14.8%
n256959
 
8.3%
F243863
 
7.9%
-171306
 
5.5%
u85653
 
2.8%
k85653
 
2.8%
o85653
 
2.8%
Other values (5)92381
 
3.0%

age
Real number (ℝ≥0)

Distinct99
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37.41247158
Minimum16
Maximum115
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size4.2 MiB
2022-09-09T22:14:29.956123image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum16
5-th percentile23
Q128
median34
Q343
95-th percentile63
Maximum115
Range99
Interquartile range (IQR)15

Descriptive statistics

Standard deviation14.00167861
Coefficient of variation (CV)0.3742516338
Kurtosis5.989173921
Mean37.41247158
Median Absolute Deviation (MAD)6
Skewness2.002420326
Sum20451565
Variance196.047004
MonotonicityNot monotonic
2022-09-09T22:14:30.066827image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3027868
 
5.1%
3127503
 
5.0%
3226907
 
4.9%
2926399
 
4.8%
2824123
 
4.4%
3423292
 
4.3%
3322965
 
4.2%
2722097
 
4.0%
3520400
 
3.7%
2519094
 
3.5%
Other values (89)306003
56.0%
ValueCountFrequency (%)
1626
 
< 0.1%
1781
 
< 0.1%
183435
 
0.6%
196529
 
1.2%
201589
 
0.3%
215727
 
1.0%
229441
1.7%
2311905
2.2%
2415017
2.7%
2519094
3.5%
ValueCountFrequency (%)
11512
 
< 0.1%
1134
 
< 0.1%
1121
 
< 0.1%
1112
 
< 0.1%
110400
 
0.1%
10935
 
< 0.1%
10815
 
< 0.1%
10723
 
< 0.1%
10623
 
< 0.1%
1056038
1.1%

signup_method
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
basic
344824 
facebook
201279 
google
 
548

Length

Max length8
Median length5
Mean length6.105614002
Min length5

Characters and Unicode

Total characters3337640
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfacebook
2nd rowbasic
3rd rowfacebook
4th rowbasic
5th rowbasic

Common Values

ValueCountFrequency (%)
basic344824
63.1%
facebook201279
36.8%
google548
 
0.1%

Length

2022-09-09T22:14:30.170550image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-09T22:14:30.263302image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
basic344824
63.1%
facebook201279
36.8%
google548
 
0.1%

Most occurring characters

ValueCountFrequency (%)
b546103
16.4%
a546103
16.4%
c546103
16.4%
o403654
12.1%
s344824
10.3%
i344824
10.3%
e201827
 
6.0%
f201279
 
6.0%
k201279
 
6.0%
g1096
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3337640
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
b546103
16.4%
a546103
16.4%
c546103
16.4%
o403654
12.1%
s344824
10.3%
i344824
10.3%
e201827
 
6.0%
f201279
 
6.0%
k201279
 
6.0%
g1096
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin3337640
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
b546103
16.4%
a546103
16.4%
c546103
16.4%
o403654
12.1%
s344824
10.3%
i344824
10.3%
e201827
 
6.0%
f201279
 
6.0%
k201279
 
6.0%
g1096
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII3337640
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
b546103
16.4%
a546103
16.4%
c546103
16.4%
o403654
12.1%
s344824
10.3%
i344824
10.3%
e201827
 
6.0%
f201279
 
6.0%
k201279
 
6.0%
g1096
 
< 0.1%

language
Categorical

Distinct25
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
en
532305 
fr
 
3420
es
 
2296
de
 
2236
zh
 
1694
Other values (20)
 
4700

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters1093302
Distinct characters19
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowen
2nd rowen
3rd rowen
4th rowen
5th rowen

Common Values

ValueCountFrequency (%)
en532305
97.4%
fr3420
 
0.6%
es2296
 
0.4%
de2236
 
0.4%
zh1694
 
0.3%
it1042
 
0.2%
ko909
 
0.2%
ru705
 
0.1%
nl367
 
0.1%
pt359
 
0.1%
Other values (15)1318
 
0.2%

Length

2022-09-09T22:14:30.341094image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
en532305
97.4%
fr3420
 
0.6%
es2296
 
0.4%
de2236
 
0.4%
zh1694
 
0.3%
it1042
 
0.2%
ko909
 
0.2%
ru705
 
0.1%
nl367
 
0.1%
pt359
 
0.1%
Other values (15)1318
 
0.2%

Most occurring characters

ValueCountFrequency (%)
e536922
49.1%
n532766
48.7%
r4238
 
0.4%
f3447
 
0.3%
s2681
 
0.2%
d2358
 
0.2%
h1731
 
0.2%
z1694
 
0.2%
t1529
 
0.1%
i1094
 
0.1%
Other values (9)4842
 
0.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1093302
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e536922
49.1%
n532766
48.7%
r4238
 
0.4%
f3447
 
0.3%
s2681
 
0.2%
d2358
 
0.2%
h1731
 
0.2%
z1694
 
0.2%
t1529
 
0.1%
i1094
 
0.1%
Other values (9)4842
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
Latin1093302
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e536922
49.1%
n532766
48.7%
r4238
 
0.4%
f3447
 
0.3%
s2681
 
0.2%
d2358
 
0.2%
h1731
 
0.2%
z1694
 
0.2%
t1529
 
0.1%
i1094
 
0.1%
Other values (9)4842
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII1093302
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e536922
49.1%
n532766
48.7%
r4238
 
0.4%
f3447
 
0.3%
s2681
 
0.2%
d2358
 
0.2%
h1731
 
0.2%
z1694
 
0.2%
t1529
 
0.1%
i1094
 
0.1%
Other values (9)4842
 
0.4%

affiliate_channel
Categorical

HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
direct
365381 
sem-brand
71891 
sem-non-brand
47479 
seo
 
24102
other
 
15816
Other values (3)
 
21982

Length

Max length13
Median length6
Mean length6.801626632
Min length3

Characters and Unicode

Total characters3718116
Distinct characters17
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowseo
2nd rowdirect
3rd rowdirect
4th rowdirect
5th rowother

Common Values

ValueCountFrequency (%)
direct365381
66.8%
sem-brand71891
 
13.2%
sem-non-brand47479
 
8.7%
seo24102
 
4.4%
other15816
 
2.9%
api13700
 
2.5%
content5501
 
1.0%
remarketing2781
 
0.5%

Length

2022-09-09T22:14:30.425867image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-09T22:14:30.525378image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
direct365381
66.8%
sem-brand71891
 
13.2%
sem-non-brand47479
 
8.7%
seo24102
 
4.4%
other15816
 
2.9%
api13700
 
2.5%
content5501
 
1.0%
remarketing2781
 
0.5%

Most occurring characters

ValueCountFrequency (%)
e535732
14.4%
r506129
13.6%
d484751
13.0%
t394980
10.6%
i381862
10.3%
c370882
10.0%
n228111
6.1%
-166849
 
4.5%
s143472
 
3.9%
a135851
 
3.7%
Other values (7)369497
9.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3551267
95.5%
Dash Punctuation166849
 
4.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e535732
15.1%
r506129
14.3%
d484751
13.7%
t394980
11.1%
i381862
10.8%
c370882
10.4%
n228111
6.4%
s143472
 
4.0%
a135851
 
3.8%
m122151
 
3.4%
Other values (6)247346
7.0%
Dash Punctuation
ValueCountFrequency (%)
-166849
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3551267
95.5%
Common166849
 
4.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e535732
15.1%
r506129
14.3%
d484751
13.7%
t394980
11.1%
i381862
10.8%
c370882
10.4%
n228111
6.4%
s143472
 
4.0%
a135851
 
3.8%
m122151
 
3.4%
Other values (6)247346
7.0%
Common
ValueCountFrequency (%)
-166849
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII3718116
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e535732
14.4%
r506129
13.6%
d484751
13.0%
t394980
10.6%
i381862
10.3%
c370882
10.0%
n228111
6.1%
-166849
 
4.5%
s143472
 
3.9%
a135851
 
3.7%
Other values (7)369497
9.9%

first_affiliate_tracked
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
untracked
300967 
linked
119437 
omg
110115 
tracked-other
 
12068
product
 
3735
Other values (2)
 
329

Length

Max length13
Median length9
Mean length7.210560303
Min length3

Characters and Unicode

Total characters3941660
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowuntracked
2nd rowuntracked
3rd rowuntracked
4th rowuntracked
5th rowuntracked

Common Values

ValueCountFrequency (%)
untracked300967
55.1%
linked119437
 
21.8%
omg110115
 
20.1%
tracked-other12068
 
2.2%
product3735
 
0.7%
marketing236
 
< 0.1%
local ops93
 
< 0.1%

Length

2022-09-09T22:14:30.625292image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-09T22:14:30.719414image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
untracked300967
55.0%
linked119437
 
21.8%
omg110115
 
20.1%
tracked-other12068
 
2.2%
product3735
 
0.7%
marketing236
 
< 0.1%
local93
 
< 0.1%
ops93
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e444776
11.3%
d436207
11.1%
k432708
11.0%
n420640
10.7%
t329074
8.3%
r329074
8.3%
c316863
8.0%
a313364
8.0%
u304702
7.7%
o126104
 
3.2%
Other values (9)488148
12.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3929499
99.7%
Dash Punctuation12068
 
0.3%
Space Separator93
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e444776
11.3%
d436207
11.1%
k432708
11.0%
n420640
10.7%
t329074
8.4%
r329074
8.4%
c316863
8.1%
a313364
8.0%
u304702
7.8%
o126104
 
3.2%
Other values (7)475987
12.1%
Dash Punctuation
ValueCountFrequency (%)
-12068
100.0%
Space Separator
ValueCountFrequency (%)
93
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3929499
99.7%
Common12161
 
0.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e444776
11.3%
d436207
11.1%
k432708
11.0%
n420640
10.7%
t329074
8.4%
r329074
8.4%
c316863
8.1%
a313364
8.0%
u304702
7.8%
o126104
 
3.2%
Other values (7)475987
12.1%
Common
ValueCountFrequency (%)
-12068
99.2%
93
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII3941660
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e444776
11.3%
d436207
11.1%
k432708
11.0%
n420640
10.7%
t329074
8.3%
r329074
8.3%
c316863
8.0%
a313364
8.0%
u304702
7.7%
o126104
 
3.2%
Other values (9)488148
12.4%

signup_app
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
Web
504185 
iOS
 
28680
Moweb
 
7944
Android
 
5842

Length

Max length7
Median length3
Mean length3.071811814
Min length3

Characters and Unicode

Total characters1679209
Distinct characters13
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWeb
2nd rowWeb
3rd rowWeb
4th rowWeb
5th rowWeb

Common Values

ValueCountFrequency (%)
Web504185
92.2%
iOS28680
 
5.2%
Moweb7944
 
1.5%
Android5842
 
1.1%

Length

2022-09-09T22:14:30.817914image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-09T22:14:30.908677image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
web504185
92.2%
ios28680
 
5.2%
moweb7944
 
1.5%
android5842
 
1.1%

Most occurring characters

ValueCountFrequency (%)
e512129
30.5%
b512129
30.5%
W504185
30.0%
i34522
 
2.1%
O28680
 
1.7%
S28680
 
1.7%
o13786
 
0.8%
d11684
 
0.7%
M7944
 
0.5%
w7944
 
0.5%
Other values (3)17526
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1103878
65.7%
Uppercase Letter575331
34.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e512129
46.4%
b512129
46.4%
i34522
 
3.1%
o13786
 
1.2%
d11684
 
1.1%
w7944
 
0.7%
n5842
 
0.5%
r5842
 
0.5%
Uppercase Letter
ValueCountFrequency (%)
W504185
87.6%
O28680
 
5.0%
S28680
 
5.0%
M7944
 
1.4%
A5842
 
1.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1679209
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e512129
30.5%
b512129
30.5%
W504185
30.0%
i34522
 
2.1%
O28680
 
1.7%
S28680
 
1.7%
o13786
 
0.8%
d11684
 
0.7%
M7944
 
0.5%
w7944
 
0.5%
Other values (3)17526
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1679209
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e512129
30.5%
b512129
30.5%
W504185
30.0%
i34522
 
2.1%
O28680
 
1.7%
S28680
 
1.7%
o13786
 
0.8%
d11684
 
0.7%
M7944
 
0.5%
w7944
 
0.5%
Other values (3)17526
 
1.0%

country_destination
Categorical

HIGH CORRELATION

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size4.2 MiB
NDF
54850 
GB
52705 
ES
50518 
US
47704 
NL
47597 
Other values (7)
293277 

Length

Max length5
Median length2
Mean length2.346374561
Min length2

Characters and Unicode

Total characters1282648
Distinct characters20
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNDF
2nd rowUS
3rd rowother
4th rowUS
5th rowUS

Common Values

ValueCountFrequency (%)
NDF54850
10.0%
GB52705
9.6%
ES50518
9.2%
US47704
8.7%
NL47597
8.7%
PT47100
8.6%
other44832
8.2%
FR43937
8.0%
CA42544
7.8%
IT40225
7.4%
Other values (2)74639
13.7%

Length

2022-09-09T22:14:30.985575image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ndf54850
10.0%
gb52705
9.6%
es50518
9.2%
us47704
8.7%
nl47597
8.7%
pt47100
8.6%
other44832
8.2%
fr43937
8.0%
ca42544
7.8%
it40225
7.4%
Other values (2)74639
13.7%

Most occurring characters

ValueCountFrequency (%)
N102447
 
8.0%
F98787
 
7.7%
S98222
 
7.7%
D92685
 
7.2%
E88353
 
6.9%
T87325
 
6.8%
U84508
 
6.6%
A79348
 
6.2%
B52705
 
4.1%
G52705
 
4.1%
Other values (10)445563
34.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1058488
82.5%
Lowercase Letter224160
 
17.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N102447
9.7%
F98787
9.3%
S98222
9.3%
D92685
8.8%
E88353
 
8.3%
T87325
 
8.2%
U84508
 
8.0%
A79348
 
7.5%
B52705
 
5.0%
G52705
 
5.0%
Other values (5)221403
20.9%
Lowercase Letter
ValueCountFrequency (%)
o44832
20.0%
t44832
20.0%
h44832
20.0%
e44832
20.0%
r44832
20.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1282648
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N102447
 
8.0%
F98787
 
7.7%
S98222
 
7.7%
D92685
 
7.2%
E88353
 
6.9%
T87325
 
6.8%
U84508
 
6.6%
A79348
 
6.2%
B52705
 
4.1%
G52705
 
4.1%
Other values (10)445563
34.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII1282648
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N102447
 
8.0%
F98787
 
7.7%
S98222
 
7.7%
D92685
 
7.2%
E88353
 
6.9%
T87325
 
6.8%
U84508
 
6.6%
A79348
 
6.2%
B52705
 
4.1%
G52705
 
4.1%
Other values (10)445563
34.7%

Interactions

2022-09-09T22:14:22.691141image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:07.206484image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:09.239746image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:11.262132image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:13.248853image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:15.161494image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:17.214131image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:19.023725image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:20.818052image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:22.957429image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:07.434583image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:09.453954image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:11.463735image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:13.443311image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:15.354976image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:17.403722image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:19.212728image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:21.004627image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:23.226709image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:07.641032image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:09.686771image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:11.667518image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:13.656234image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:15.558235image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:17.599135image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:19.409580image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:21.203807image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:23.486296image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:07.870418image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:09.906339image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:11.883559image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:13.861684image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:15.751731image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:17.793613image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:19.597620image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:21.394633image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:23.750603image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:08.076866image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:10.127628image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:12.104491image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:14.072122image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:15.967048image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:17.986569image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:19.790586image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:21.584616image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:24.012488image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:08.282317image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:10.359658image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:12.320502image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:14.290504image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:16.167584image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:18.177595image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:19.982847image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:21.776356image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:24.275319image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:08.514696image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:10.576564image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:12.540524image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:14.497066image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:16.372219image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:18.368590image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:20.170860image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:21.966444image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:24.537763image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:08.720361image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:10.784699image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:12.763735image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:14.696737image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:16.568102image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:18.562116image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:20.362667image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:22.155844image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:24.845219image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:09.025319image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:11.070918image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:13.047567image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:14.974993image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:16.831911image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:18.830690image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:20.629790image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-09-09T22:14:22.426848image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2022-09-09T22:14:31.076697image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-09T22:14:31.223330image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-09T22:14:31.379911image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-09T22:14:31.525522image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-09-09T22:14:31.651186image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-09T22:14:25.419426image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-09T22:14:26.366282image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

signup_flowdays_from_first_active_until_account_createddays_from_account_created_until_first_bookingday_first_bookingday_of_week_first_bookingyear_account_createdday_account_createdday_of_week_account_createdweek_of_year_account_createdgenderagesignup_methodlanguageaffiliate_channelfirst_affiliate_trackedsignup_appcountry_destination
007321496290201125221MALE38facebookenseountrackedWebNDF
13476-5720201028139FEMALE56basicendirectuntrackedWebUS
207652788520115049FEMALE42facebookendirectuntrackedWebother
30280-208183201014137-unknown-41basicendirectuntrackedWebUS
40035120102553FEMALE46basicenotheruntrackedWebUS
5001013220103653FEMALE47basicendirectomgWebUS
6002062932010401FEMALE50basicenotheruntrackedWebUS
7000402010401-unknown-46basicenotheromgWebUS
8002622010401FEMALE36basicenotheruntrackedWebUS
90020012902010511FEMALE47basicenotheruntrackedWebNDF

Last rows

signup_flowdays_from_first_active_until_account_createddays_from_account_created_until_first_bookingday_first_bookingday_of_week_first_bookingyear_account_createdday_account_createdday_of_week_account_createdweek_of_year_account_createdgenderagesignup_methodlanguageaffiliate_channelfirst_affiliate_trackedsignup_appcountry_destination
546641001193201218251FEMALE25basicendirectuntrackedWebother
5466422507419420145212FEMALE30basicendirectuntrackediOSother
546643001202201319147MALE25basickosem-non-brandomgWebother
5466440012220131140FEMALE41basicendirectomgWebother
546645001224201221312FEMALE33basicendirectlinkedWebother
546646000210201421017-unknown-41basicensem-brandomgWebother
546647005526520122027MALE28facebookensem-brandomgWebother
5466480022123201421221-unknown-30basicendirectuntrackedWebother
54664900215220141303FEMALE30basicendirectlinkedWebother
546650101230201222629FEMALE24basicendirectlinkedWebother

Duplicate rows

Most frequently occurring

signup_flowdays_from_first_active_until_account_createddays_from_account_created_until_first_bookingday_first_bookingday_of_week_first_bookingyear_account_createdday_account_createdday_of_week_account_createdweek_of_year_account_createdgenderagesignup_methodlanguageaffiliate_channelfirst_affiliate_trackedsignup_appcountry_destination# duplicates
6749000155201315524FEMALE31facebookendirectuntrackedWebPT132
113000122012129FEMALE40basicensem-brandlinkedWebPT127
25310006320136323FEMALE25basicendirectuntrackedWebPT112
1124300025520132544FEMALE25facebookendirectlinkedWebPT107
9966000230201423026FEMALE21facebookenseountrackedWebPT97
12049000273201327326FEMALE45basicensem-non-brandomgWebPT96
9130003020133010-unknown-35basicensem-brandomgWebPT95
10410003120133123FEMALE19basicendirectuntrackedWebPT90
19540005120135110MALE25facebookensem-brandomgWebPT86
5808000135201313527-unknown-36basicendirectuntrackedWebPT86